Design and Analysis of a Hardware CNN Accelerator

نویسندگان

  • Kevin Kiningham
  • Michael Graczyk
  • Athul Ramkumar
چکیده

In recent years, Convolutional Neural Networks (CNNs) have revolutionized computer vision tasks. However, inference in current CNN designs is extremely computationally intensive. This has lead to an explosion of new accelerator architectures designed to reduce power consumption and latency [20]. In this paper, we design and implement a systolic array based architecture we call ConvAU to efficiently accelerate dense matrix multiplication operations in CNNs. We also train an 8-bit quantized version of Squeezenet[14] and evaluate our accelerator’s power consumption and throughput. Finally, we compare our results to the reported results for the K80 GPU and Google’s TPU. We find that ConvAU gives a 200x improvement in TOPs/W when compared to a NVIDIA K80 GPU and a 1.9x improvement when compared to the TPU.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Sparsity Analysis of Deep Learning Models and Corresponding Accelerator Design on FPGA

Machine learning has achieved great success in recent years, especially the deep learning algorithms based on Artificial Neural Network. However, high performance and large memories are needed for these models , which makes them not suitable for IoT device, as IoT devices have limited performance and should be low cost and less energy-consuming. Therefore, it is necessary to optimize the deep l...

متن کامل

Compiling Deep Learning Models for Custom Hardware Accelerators

Convolutional neural networks (CNNs) are the core of most state-of-the-art deep learning algorithms specialized for object detection and classification. CNNs are both computationally complex and embarrassingly parallel. Two properties that leave room for potential software and hardware optimizations for embedded systems. Given a programmable hardware accelerator with a CNN oriented custom instr...

متن کامل

On-Chip CNN Accelerator for Image Super-Resolution

To implement convolutional neural networks (CNN) in hardware, the state-of-the-art CNN accelerators pipeline computation and data transfer stages using an off-chip memory and simultaneously execute them on the same timeline. However, since a large amount of feature maps generated during the operation should be transmitted to the off-chip memory, the pipeline stage length is determined by the of...

متن کامل

Computation Error Analysis of Block Floating Point Arithmetic Oriented Convolution Neural Network Accelerator Design

The heavy burdens of computation and off-chip traffic impede deploying the large scale convolution neural network on embedded platforms. As CNN is attributed to the strong endurance to computation errors, employing block floating point (BFP) arithmetics in CNN accelerators could save the hardware cost and data traffics efficiently, while maintaining the classification accuracy. In this paper, w...

متن کامل

PipeCNN: An OpenCL-Based FPGA Accelerator for Large-Scale Convolution Neuron Networks

Convolutional neural networks (CNNs) have been widely employed in many applications such as image classification, video analysis and speech recognition. Being computeintensive, CNN computations are mainly accelerated by GPUs with high power dissipations. Recently, studies were carried out exploiting FPGA as CNN accelerator because of its reconfigurability and energy efficiency advantage over GP...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2017